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ABSTRACT mRNA profiling of pathogens during the course of human infections gives detailed information on the expression 
levels of relevant genes that drive pathogenicity and adaptation and at the same time allows for the delineation of phylogenetic 
relatedness of pathogens that cause specific diseases. In this study, we used mRNA sequencing to acquire information on the ex- 
pression of Escherichia coli pathogenicity genes during urinary tract infections (UTI) in humans and to assign the UTI- 
associated E. coli isolates to different phylogenetic groups. Whereas the in vivo gene expression profiles of the majority of genes 
were conserved among 21 E. coli strains in the urine of elderly patients suffering from an acute UTI, the specific gene expression 
profiles of the flexible genomes was diverse and reflected phylogenetic relationships. Furthermore, genes transcribed in vivo rel- 
ative to laboratory media included weU-described virulence factors, smaU regulatory RNAs, as weU as genes not previously 
linked to bacterial virulence. Knowledge on relevant transcriptional responses that drive pathogenicity and adaptation of iso- 
lates to the human host might lead to the introduction of a virulence typing strategy into clinical microbiology, potentiaUy facUi- 
tating management and prevention of the disease. 

IMPORTANCE Urinary tract infections (UTI) are very common; at least half of all women experience UTI, most of which are 
caused by pathogenic Escherichia coli strains. In this study, we applied massive paraUel cDNA sequencing (RNA-seq) to provide 
unbiased, deep, and accurate insight into the nature and the dimension of the uropathogenic E. coli gene expression profile dur- 
ing an acute UTI within the human host. This work was undertaken to identify key players in physiological adaptation processes 
and, hence, potential targets for new infection prevention and therapy interventions specificaUy aimed at sabotaging bacterial 
adaptation to the human host. 
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TO successfully thrive in the host environment during the 
course of an infection, pathogens have to rapidly adapt to the 
specific conditions encountered. Thereby, a key to understanding 
microbial pathogenesis lies in knowledge of which genes are ex- 
pressed to initiate and maintain the infection and of the global 
impact of the host environment on the transcriptional profile of 
the pathogen (1). Urinary tract infections (UTI) are one of the 
most common bacterial infections worldwide, and most of them 
(over 80%) are caused by uropathogenic Escherichia coli (UPEC) 

(2) . It is widely accepted that UPEC strains originate from the 
distal gut microbiota where they mostly behave as commensals 

(3) , although UPEC strains are armed with extra virulence genes 

(4) . Those virulence genes are often present on strain-specific 
pathogenicity islands (PAIs), which are clusters of virulence- 



related genes (5-7). PAIs are diverse in content and genome loca- 
tion and, as more sequence information of more examples of the 
islands accumulates, greater insights into their role in disease can 
be expected (8, 9). 

UTI is recognized as presence of the bacteria in urine (bacteri- 
uria). During the course of infection, bacterial cells are attaching 
to human epithelial cells, utilizing chaperone usher (CU) fimbriae 
that contain adhesins on their tips (10). The prototypical CU type 
I fimbriae adhesion can lead to intracellular invasion of bladder 
epithelial cells (11). UPEC strains are known to enter the cyto- 
plasm and form biofilm-like structures called intracellular bacte- 
rial communities (IBC) (12). After maturation of IBC, the UPEC 
cells can disperse into urine, or as part of the host response the 
infected epithelial cells may be exfoliated and released into urine. 
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Exfoliated cells are replaced with transition epithelial cells, which 
may be as well invaded by UPEC, where it forms quiescent intra- 
cellular reservoirs (QIR) characterized by their persistence and 
antibiotic resistance (13). 

In vitro studies and various animal models have been valuable 
for exploring UPEC pathogenesis (14, 15) and have led to signif- 
icant advances in understanding key pathogenicity mechanisms 
(16-22). Knowledge of UPEC gene expression during naturally 
occurring UTI will further add to the full understanding of micro- 
bial pathogenesis of this widespread bacterial pathogen. Indeed, 
investigation of complex transcriptional adaptation processes of 
UPECs to the human host is expected to uncover key regulatory 
components and to provide unique insight into bacterial patho- 
genicity (23). Furthermore, the identification of E. coli virulence 
genes associated with UTIs is potentially valuable in differentiat- 
ing UPEC from nonuropathogenic E. coli and might lead to the 
introduction of virulence typing strategies into clinical microbiol- 
ogy- 
Today, the significant advances in next-generation sequencing 
technologies enable unbiased and very accurate quantitative 
annotation-independent detection of transcripts at high resolu- 
tion (24). Furthermore, RNA sequencing (RNA-seq) can be used 
to extract genotype information from the cDNA on a single- 
nucleotide resolution level, providing profound insights into phy- 
logenic relatedness. Although RNA sequencing studies have been 
widely used for quantitative and qualitative transcriptional profil- 
ing of various bacterial pathogens (24-30), the application of 
RNA-seq to determine global transcriptional profiles during the 
infection of the human host has remained very limited. 

In this study, we used strand-specific RNA-Seq to generate 
comprehensive in vivo transcriptional profiles of 2 1 UPEC stains 
causing symptomatic UTI in a cohort of elderly patients and 
gained profound insights into the conservation/variation of tran- 
scription patterns across UPEC isolates that exhibited a broad 
phylogenetic distribution. While most known UPEC virulence 
factors could be identified, comparison of the in vivo transcrip- 
tional profiles uncovered a set of genes that is specifically tran- 
scribed during the course of an infection and which cannot be 
inferred from analyzing genomes or from transcriptional profiles 
of UPEC isolates recorded under laboratory culture conditions. 

RESULTS 

Broad phylogenetic distribution of E. coli UTI isolates isolated 
from elderly patients. With the aim to record in vivo transcrip- 
tional profiles of UPEC stains, urine samples were collected from 
outpatients with symptomatic UTI prior to antibiotic treatment. 
Overall, 2 1 urine samples were included in this study. All of them 
were culture positive on MacConkey agar plates, with more than 
10^ E. coli CFU/ml urine in pure cultures, and microscopic inspec- 
tion of urine sediments revealed the presence of massive numbers 
of neutrophils (>100//li1). The 21 patients were mainly elderly 
(mean age above 60 years, with only 4 patients being younger than 
60 years), 8 were male, and 13 were female. RNA isolation proce- 
dures and strand-specific lUumina-based RNA sequencing of bac- 
terial mRNA were performed, and the raw sequence output after 
the removal of reads that mapped to the human genome consisted 
of 61.01 million reads. Thus, on average, 2.9 million reads were 
retrieved from each of the 21 samples. In accordance with the 
finding that the gene content between pairs of £. coli genomes may 
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FIG 1 Phylogenetic tree of 54 previously sequenced strains and the 2 1 clinical 
isolates from this (in italic) work based on sequence variation within 336 
genes. Phylogenetic groups are indicated based on previous reports (34, 35). 
The numbers show the bootstrapping values as provided by RaxML. 



diverge by more than 30%, the range of gene numbers to which 
those reads mapped was between 3,848 and 4,972. 

In E. coli, <3% of nucleotide divergence is found among con- 
served genes in the various genomes (6). This high degree of ho- 
mogeneity allows the establishment of phylograms that are built 
upon sequence variations. Previous studies have identified five 
major phylogenetic groups, (B2, Bl, D, A, and E), corresponding 
to E. coli strains with distinct capability to cause disease and to 
inhabit various ecological niches (31-36). Figure 1 depicts the 
phylogenetic distribution of previously sequenced E. coli isolates 
that have been grouped into the five phylogenetic E. coli groups. 
This tree is based on sequence variations of 336 genes (for those 
genes, at least 80% sequencing coverage across the 2 1 UTI isolates 
was detected), which allowed us to use the genotype information 
from the RNA-seq data of the E. coli genomes to assign the 2 1 UTI 
isolates of this study to the clusters within the phylogenetic tree 
(Fig. 1). Reflecting the fact that our study group consisted mostly 
of elderly patients, we found a broad distribution of the 21 UTI- 
associated isolates between the phylogenetic groups. A total of 
43% of the 21 isolates belong to the virulent E. coli strain phylo- 
groups B2 and D (B2, 33%; D, 10%), whereas the others are dis- 
tributed in the Bl (38%) and A (19%) phylogroups. 

Commonly transcribed genes of the E. coli UTI isolates ex- 
hibit a conserved expression profile. With the aim to uncover the 
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fall extent of the in vivo gene expression profile of the 2 1 clinical 
E. coli isolates, we mapped all obtained lUumina sequencing reads 
to a list of 12,33 1 nonredundant E. coli genes. This list of genes was 
generated by the comparative genomic analysis of 54 previously 
fully sequenced E. coli genomes (see Materials and Methods). The 
entire list, including ortholog identifiers (IDs) as well as the ex- 
pression values of the 2 1 UTI samples, is provided in Data Set S 1 in 
the supplemental material. This list includes 2,129 genes shared by 
all 54 strains and 10,202 genes that are absent in at least one of the 
54 strains. Among the latter, 3,257 genes were found in only one of 
the 54 published genomes as singletons. Only very few genes hav- 
ing homologs in all 54 sequenced E. coli isolates were not tran- 
scribed in any of the 21 isolates under in vivo conditions, indicat- 
ing that expression of most of the core genome is relevant for 
bacterial replication in the human urinary tract. Furthermore, we 
found a large set of overall 2,589 genes that were commonly tran- 
scribed in all isolates during in vivo conditions, which — depend- 
ing on the genome size of the isolates — accounts for 52% to 67% 
of all transcribed genes within one isolate. As depicted in Fig. S 1 in 
the supplemental material, those commonly expressed 2,589 
genes appear to be unregulated or constitutively expressed, as the 
overall variation of the expression profiles among the isolates was 
low and the genes were expressed at a generally high level inde- 
pendently of their phylogenetic group specificity. As expected, 
many of these genes correspond to genes required for the mainte- 
nance of basic cellular functions, such as DNA repair, ATP syn- 
thesis, aminosugar metabolism, and protein transport (see Ta- 
ble SI in the supplemental material). 

Since we found only a low variation in the expression levels of 
the genes commonly transcribed in all 21 E. coli isolates at the time 
of mRNA sampling, hierarchical clustering based on their tran- 
scriptional profiles did not reveal specific and distinct clusters. We 
also performed matrix-assisted laser desorption ionization-time 
of flight (MALDI-TOF) mass spectrometry bio typing to elucidate 
whether protein fingerprints might uncover clusters that serve for 
the identification of phylogenetic relatedness. MALDI-TOF mass 
spectrometry (see Fig. S2) correctly classified our UTI E. coli iso- 
lates on the species level. However, a dendrogram based on 
Minkowski distances and group averages did not reveal distinct 
subgroups within our isolates that would correlate to the previ- 
ously identified phylogenetic groups B2, Bl, A, and D. This may 
reflect the fact that MALDI-TOF mass spectrometry covers mostly 
housekeeping proteins, e.g., the ribosomal proteins, and therefore 
is ill suited to discriminate phylogenetic relationships. 

The in vivo gene expression profile of the E. coli UTI isolates 
correlates with phylogenetic group clustering. Mapping of all 
obtained lUumina sequencing reads to the list of 12,331 nonre- 
dundant E. coli genes revealed — apart from the 2,589 commonly 
transcribed genes (see above) — a large fraction of genes (6,305 
genes) that were expressed in at least one of the 2 1 UTI strains (see 
Data Set SI). 

Remarkably, clustering of the in vivo transcripts based on prin- 
cipal component analysis (PCA) of the 21 UTI isolates (Fig. 2), 
including commonly transcribed genes as well as those of the flex- 
ible genome, compared very well to that of phylogenetic clustering 
based on the single nucleotide polymorphism (SNP) profile 
(Fig. 1). The expression profile of the 21 UTI samples clustered 
into three main groups that represented the B2, D, and A/Bl phy- 
logenetic groups. Of note, clustering became even more accurate 
and well separated when only the expression of genes of the flex- 
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FIG 2 Clustering of the in vivo transcripts of the 2 1 UTI isolates based on 
principal component analysis (PCA). Clustering clearly reflects phylogenetic 
relatedness as the clinical isolates grouped according to their affiliation to the 
B2, D, and A/Bl phylogroups. 



ible genomes was included in the analysis (data not shown). These 
results are in agreement with previous reports (37, 38) and clearly 
demonstrate that the presence of group-specific gene repertoires, 
and not a difference in overall gene expression profiles, impacts on 
clustering of the UTI isolates into the phylogroups. 

We also performed a de novo assembly of reads from the 21 
isolates that did not match any of the 54 sequenced genomes, 
which resulted in the identification of 158 potential genes, 48 of 
which are organized in operon structures. A total of 105 of the 
genes have homologs in E. coli, and 53 have homologs in other 
Enterobacteriaceae (see Table S2). 

In vivo mRNA expression profiling of known UPEC viru- 
lence factors. Many of the genes found to be expressed in vivo in 
the 21 UTI isolates included known key E. coli virulence factors. 
Although we sampled voided bacteria, which are clearly distinct 
from attached and biofilm-grown bacteria, genes responsible for 
adhesion to the uroepithelium, e.g., type I fimbriae (fim) (16), P 
fimbriae (pap), FIC/S fimbriae (/oc and s/fl), were found (Table 1). 
However, we did not find a uniform expression of any of those 
common adhesion-related genes. Whereas no or only very low 
expression of fimA, whose expression has been demonstrated to 
enhance E. coli virulence in the urinary tract (16), could be de- 
tected in 13 UPEC isolates in this study, the^mA gene and the 
subsequent operon was highly expressed in 8 isolates. Addition- 
ally, P fimbriae and FlC/S fimbriae-encoding genes were ex- 
pressed in a subset of isolates (5 and 3 isolates, respectively). In- 
terestingly, FIC/S fimbriae gene expression was exclusively found 
in isolates which grouped to the phylogenetic B2 cluster. 

Genes encoding iron acquisition systems were found to be 
widely expressed in vivo. The enterobactin and its transport 
system-encoding genes (ent and fep) were expressed in all UTI 
isolates without exception, whereas expression of aerobactin (iuc), 
yersiniabactin (I'rp), and salmochelin (iro) genes was less uniform. 
Expression of the heme-mediated iron acquisition system (cliu) 
was present in 100% of isolates clustering with the D and B2 phy- 
logenetic groups and in 75% of the isolates clustering with group 
A but absent in those clustering with group Bl. Capsular polysac- 
charide expression was observed in 100% of isolates clustering 
with group D and B2, in 50% clustering with group A, and only 
partially in one isolate of group B 1 . The expression of extracellular 
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TABLE 1 UPEC virulence genes present in the 2 1 clinical isolates 



Result by phylogenetic group and UTI isolate no.' 
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toxin-encoding genes in vivo was less frequent. Overall extracellu- 
lar toxin expression was most frequent in those UTI isolates that 
clustered with strains from the B2 phylogenetic group, cnfl and 
hlyA expression was observed in only two and three isolates, re- 
spectively. Expression of genes encoding serine protease auto- 
transporter PicU was present in five isolates, whereas the gene 
encoding the serine protease autotransporter Sat was expressed in 
four isolates, all of them clustering with the B2 phylogenetic 
group. The vacuolating autotransporter toxin-encoding gene vat 
was expressed in 8 isolates, and ups gene expression was found 
exclusively in isolates that clustered with group B2 isolates. The 
genes encoding the transport system of colicin V were detected in 
5 isolates, and the clh operon encoded on the pks island (39) was 
observed to be expressed in 3 isolates, again clustering with strains 
of the B2 group. 

In vivo expression profiling of small regulatory RNAs. RNA- 
seq profiling enabled us to investigate expression of small regula- 
tory RNAs (sRNAs), which have been assigned central roles in 
virulence and environmental fitness (40, 41). Eleven sRNAs were 
identified that exhibited high in vivo expression levels in all or 
most of the 21 clinical UTI isolates (Fig. 3). 

Among them, ryiA (glmZ) and csrB exhibited the highest in 
vivo expression levels. The sRNA RyiA (GlmZ) activates glniS ex- 
pression. GlmS synthesizes glucosamine-6-phosphate (GlcN-6-P) 
and thus delivers precursor molecules for the biosynthesis of pep- 
tidoglycan and lipopolysaccharides (LPS), which are essential el- 
ements of the Gram-negative bacterial cell wall (42, 43). Another 
sRNA that was found to be highly expressed during UTI was csrB. 
Both sRNAs csrB and csrC are modulatory components of the 
carbon storage regulatory (Csr) network. They contain multiple 



CsrA binding sites, which permit them to sequester and antago- 
nize CsrA, a pleiotropic regulator of carbon metabolism (44-46). 
Transcription of these two small RNAs is regulated by the BarA/ 
UvrY two-component signal transduction system (TCS) in E. coli 
or by homologous systems such as GacS/GacA in other bacteria 
(47). The Csr system (or the homolog RsmA/RsmZ) is present in 
many eubacteria and is known to be involved in mediating adap- 
tive physiology, timed virulence trait expression in animal patho- 
gens (48, 49), and biofilm formation (50, 51). Recently, it was 
shown to interact with the stringent response regulatory system 
(52). 

Although other sRNA were also identified to be expressed dur- 
ing in vivo growth, their overall expression levels were often lower 
than those observed in four representatives of our clinical isolates 
that were cultivated in vitro under rich medium conditions until 
late exponential growth phase. Apparently, a large number of 
those highly in vzfro-expressed sRNAs serve the adaptation to sta- 
tionary phase of growth (Fig. 3). Among those, we found micA, a 
negative regulator of ompA (53), ryhA (arcZ), and rprA, encoding 
sRNAs that increase the translation of the stationary sigma factor 
RpoS (54, 55). Their expression, as well as the expression of sroC 
and ryeB, has been associated with stationary phase of growth (56, 
57). Additionally, the products of rprA, isrA (mcsA), and omrA 
were strongly expressed. Those sRNAs have been shown to nega- 
tively regulate the translation of CsgD, the major transcriptional 
regulator of E. coli curli biosynthesis (58, 59). 

Identification of infection-relevant gene expression profiles 
in the E. coli UTI isolates. In addition to known virulence factors, 
we aimed at identifying infection-relevant genes that are com- 
monly expressed in UPEC isolates in vivo. We therefore cultivated 
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4 of the UTI isolates (isolates UTIU3 and UTIU5 clustering with 
phylogroup Bl, UTI24 clustering with group D, and UTI9 clus- 
tering with group B2) in vitro under rich medium conditions and 
recorded the transcriptional profiles. A total of 202 genes were 
found to be upregulated under in vivo conditions in the 4 strains, 
and all of those genes have been demonstrated to be expressed in 
all 2 1 UTI isolates in this study under in vivo conditions. A detailed 



list of the 202 commonly and exclusively in vivo expressed genes is 
provided in Table S3 in the supplemental material (detailed data 
on all differentially expressed genes is given in Data Set S2A [up- 
regulated genes] and S2B [downregulated genes] in the supple- 
mental material). Whereas only 23 hypothetical or conserved hy- 
pothetical genes were found, use of the systematic functional 
annotation provided by Gene Ontology revealed that 20% of the 
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genes belonged to functional groups involved in general biological 
processes such as ATP synthesis and catabolic processes as well as 
transcription, translation, and DNA replication and repair. Fur- 
thermore, many genes were found to be involved in rRNA and 
tRNA processing, indicating that the bacteria rapidly grow in the 
human urinary tract. Consistent with the fact that main carbon 
sources for E. coli during UTI are peptides and amino acids, genes 
that belong to biological processes of proteolysis, protein trans- 
porters, carbohydrate metabolism, and fatty acid biosynthesis 
were represented. We also found that genes encoding enzymes of 
the pyruvate dehydrogenase complex were highly expressed. An- 
other large group of genes commonly expressed in vivo were genes 
involved in the regulation of bacterial cell shape and in bacterial 
stress responses, such as responses to toxic substances, including 
antibiotics. Furthermore, we found an in vivo overexpression of 
ampG encoding a peptidoglycan permease, which was shown to be 
involved in evasion of the host innate immune system during UTI 
(60). We also found gidA encoding the tRNA uridine 
5-carboxymethylaminomethyl modification enzyme (61) among 
the commonly in vivo expressed genes. gidA is known to impact on 
the posttranscriptional level on a number of virulence factors in 
Pseudomonas syringae (62), Aeromonas hydrophila (63), Shigella 
flexneri (64), Streptococcus suis (65), Streptococcus pyogenes (66), 
Salmonella enterica serovar Typhimurium (67), and E. coli (68). 
Of note, the gidA gene was also shown to be upregulated in the 
majority of patients' samples from a previous study on UPEC 
transcriptomics (69) and in an earlier murine UTI in vivo gene 
expression study (14). Overall, our transcriptional data are re- 
markably consistent with those previous reports ( 14, 69) and with 
the transcriptional profile of E. coli isolated from patients with 
asymptomatic bacteriuria (ABU) (70). Expression of genes in- 
volved in nitrate/nitrite metabolism and nitric oxide (NO) pro- 
tection, upregulation of iron acquisition systems, and genes in- 
volved in carbohydrate and amino acid metabolism were 
commonly observed ( 14, 69, 70), reflecting bacterial adaptation to 
the growth conditions encountered in the environment of the 
urinary tract. Interestingly, for 3 {carA, carB, and argC) out of the 
202 commonly highly expressed in vivo genes in our study, it was 
shown that their inactivation poses a competitive disadvantage to 
the respective mutants in the mouse urinary tract (71). These re- 
sults clearly suggest that their expression is crucial for growth in 
the urinary tract. 

Identification of genes that are exclusively expressed in the 
E. coli Bl and A phylogenetic groups. To evaluate whether the 
UTI-associated isolates that group with B 1 and A express a distinct 
set of genes potentially relevant for the infection process, we ex- 
tracted from the list of genes that were found to be differentially 
regulated among the 21 UPEC isolates those that were specifically 
expressed in the 12 phylogroup A/Bl isolates. We identified 142 
genes that were expressed at a significantly higher level in the 12 
phylogroup A/Bl isolates (see Fig. S3 and Table S4A in the sup- 
plemental material), compared to all other 9 isolates. Interest- 
ingly, 27 (19%) of these genes were associated with utilization of 
alternative carbon sources, with in particular the complete set of 
the 12 genes required for phenylalanine degradation into succinyl 
coenzyme A (CoA) (tynA, feaB, paaKEACBGZJFH), indicating 
that those isolates have access to sufficient amounts of phenylala- 
nine in the urine. Of note, mutations in aroA have been used to 
construct attenuated strains of various Gram-negative bacteria, 
including E. coli (72). Thereby, the attenuation is due to the in- 



ability of aroA mutants to synthetize chorismate, which is a pre- 
cursor of important biochemical intermediates such as indole and 
aromatic amino acids, many alkaloids, and other aromatic metab- 
olites, as well as folate and 2,3-dihydroxybenzoic acid used for 
enterobactin biosynthesis. The availability of aromatic amino ac- 
ids in the urine may not only enable E. coli growth on 
2 -phenylalanine but also may save chorismate for iron chelator 
biosynthesis as a crucial virulence trait. Interestingly, among the in 
vivo expressed genes that were found to be enriched in the 12 
phylogroup A/Bl isolates, we also found iroC and iroD involved in 
transport and procession of the siderophore salmochelin. iroC was 
also upregulated in vivo compared to LB cultures in one of two 
isolates, clustering with group Bl, for which an in vitro transcrip- 
tional profile was recorded. These results indicate that the sidero- 
phore may play an important role in iron acquisition within the 
subgroup of UPEC isolates that cluster in the A/B 1 phylogroup 
and that lack the common UPEC-associated virulence gene ex- 
pression. 

We could also detect 13 (9%) genes encoding fimbrial adhesins 
mostly described as functional but cryptic, including the ycb 
operon (ycbRSTUVF), part of the yra operon {yraH, yraj, and 
yraBC) (73, 74), and genes encoding a CSl-type fimbrial structure 
(10CE_3624, -25, -26, and -27) that is usually associated with 
enterotoxigenic E. coli (75). The enrichment in fimbriae genes in 
the group of the 12 studied A/Bl isolates could reflect a character- 
istic increased adhesion capability (76). 

We also observed expression of Rhs element genes. Many bac- 
teria contain all or part of 5 Rhs elements: RhsA, -B, -C, -D, and -E, 
scattered around the chromosome. Each Rhs region contains a 
3.7-kb GC-rich DNA sequence that is 99% identical from one 
element to another. These high-identity levels between Rhs pro- 
teins was proposed to mediate major intraspecies chromosomal 
rearrangements, hence their name (which stands for "recombina- 
tion hot spot") (77). However, high conservation of intact rhs 
main genes {rhsA, -B, -C, -D, -E) also suggested that they could 
contribute to a function subjected to selective pressure (78). In- 
triguingly, rhs genes are not expressed to a detectable extent dur- 
ing routine cultivation, and the conditions leading to Rhs expres- 
sion have not yet been elucidated (79). Our mRNA expression 
analysis demonstrates that some Rhs elements are specifically ex- 
pressed in vivo in all 12 UPEC isolates belonging to the Bl/A phy- 
logroup. Of note, recent studies suggested that expression of Rhs 
elements are associated with bacterium-host or bacterium- 
bacterium interactions, suggesting that such functions could con- 
tribute to UTI (80, 81). Their expression has furthermore been 
associated with toxin- antitoxin (TA) activity and to be potentially 
delivered through a type 6 secretion apparatus delivering effectors 
both in prokaryotic and eukaryotic prey cells (82, 83). Some TA 
systems have recently been shown to be important for coloniza- 
tion of the bladder {yefM-yoeB and tomB-hha) and survival within 
the kidneys {pasTI, previously named yfjGF) in a murine UTI 
model (84). In this study, the chpAR, yafQ-dinJ, and hicBA TA 
systems were found to be highly expressed in vivo, specifically in 
the isolates clustering with phylogroups A/Bl. 

Identification of genes that are exclusively expressed in the 
E. coli from the B2 phylogenetic group. Besides identification of 
genes specifically expressed in the isolates clustering with the Bl 
and A phylogroups, we also identified 389 genes that were specif- 
ically expressed in strains clustering with the B2 phylogroup. A 
total of 208 out of the 389 genes encode hypothetical or conserved 
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hypothetical proteins, and 102 are annotated as encoding putative 
proteins (see Table S4B in the supplemental material). Apart from 
the well-described virulence genes, such as sat, encoding the se- 
creted autotransporter toxin (85), or usp, encoding the 
uropathogenic-specific protein (86), as well as yadC, yadN, and 
yfcPQU, encoding putative fimbria-like proteins (73, 87), we 
found a large number of genes encoding transporter and secretion 
systems. We found genes encoding components of type II general 
secretion pathways yheBDK and hofDFGHIK, also annotated as 
gsp genes in the gspC-0 operon involved in secretion of endochiti- 
nase yheB (chiA) (88). Secreted chitinase is increasingly recog- 
nized as a virulence factor of pathogenic bacteria infecting mam- 
mal host (89). We also found that components of the hypothetical 
type VI secretion pathway (encoded by APEC01_3694, 3695, 
3696, 3698, 3702, 3705, 3711, and 3712, E. coli APEC 01:K1:K7 
gene IDs) were expressed in vivo. Furthermore, a large group of 
genes encoding various transport systems, like yjcTU encoding a 
D-allose ABC transporter and a putative iron compound ABC 
transporter encoded by APEC01_3384 to APEC01_3389, as well 
as a B2 phylogroup-specific expression of phosphotransferase sys- 
tems (PTS) responsible for transport of sugars into the bacterial 
cell, were identified. In contrast to the isolates clustering with the 
A/Bl phylogroups that exhibited extensive upregulation of the 
phenylalanine degradation pathway, isolates clustering with the 
B2 group seem to use various sugars as main carbon and energy 
sources. 

DISCUSSION 

A key to understand microbial pathogenesis is to unravel how the 
host environment impacts on the global gene expression pattern 
of a pathogen and to identify the gene repertoire whose expression 
is essential for the initiation and maintenance of an infection. In 
this study, we applied massive parallel cDNA sequencing (RNA- 
seq) to provide unbiased, deep, and accurate insight into the na- 
ture and the dimension of the uropathogenic E. coli gene expres- 
sion profile during an acute infection within the human host 
measured on bacteria present in voided urine. It is essential to 
indicate here that complex bacterial communities are present in 
the course of infection. In the current sampling procedure, we 
analyzed mainly planktonic bacteria, probably mixed with IBC 
from exfoliated epithelial bladder cells. It is possible that tran- 
scription profiling of selected adhesive cell population or IBC only 
would result in different gene expression results. 

With a total of 2 1 in vivo transcriptomes, this study includes a 
large number of bacterial strains studied in respect to pathogenic 
E. coli gene expression following naturally occurring symptomatic 
human UTI. We applied RNA-seq to detect global transcriptional 
profiles independent of genome annotations and analyzed the in 
vivo transcriptomes to their full extent, including flexible genomic 
elements and expression of small regulatory RNAs. Furthermore, 
we identified single nucleotide polymorphisms (SNPs) in the bac- 
terial isolates and used their cumulative differences to provide a 
large number of discriminators. These discriminators represent 
typing markers to distinguish bacterial isolates and to group the 2 1 
UPEC isolates to one of the four main phylogenetic groups. A, Bl, 
B2, and D. 

Our findings on gene expression profiles in the urine of pa- 
tients suffering from a UTI are generally consistent with data gen- 
erated using murine models and a previous array-based transcrip- 
tome study of gene expression during a human UTI (14, 69, 70). 



When comparing the in vivo gene expression profiles to those 
recorded under laboratory medium conditions, we found that 
E. coli adapts to the conditions encountered within the human 
host by expressing genes required for rapid replication, acquisi- 
tion of iron, attachment to the uroepithel, and evasion of the im- 
mune system, while variably expressing virulence genes. Analysis 
of sRNA expression revealed consistent expression of sRNA in- 
volved in cell wall biosynthesis and integration of membrane pro- 
teins iglmZY) (42, 43) and in mediating adaptive physiology and 
timed virulence (csrBC) (44-46, 48, 49), underpinning the role of 
sRNAs in bacterial adaptation processes. 

Although it is widely accepted that UPEC strains originate 
from the distal gut microbiota, they seem to be capable of colo- 
nizing the urinary tract and to cause symptomatic infections of 
cystitis and pyelonephritis, because they are armed with extra vir- 
ulence genes that distinguish them from E. coli commensals (4). 
Several studies have demonstrated that the phylogroups differ in 
respect to the presence of virulence factors and ecological niches, 
and UPEC isolates have previously been found to be more preva- 
lent in group B2. In line with this, we found 7 UPEC isolates that 
grouped with the B2 phylogenic group, and they expressed several 
virulence genes in vivo that have been associated with UPEC 
strains exhibiting full-pathogenic potential. Nevertheless, and in 
accordance with previous studies on atypical UTI patient popula- 
tions (90-92), in our study, which was performed on samples 
collected mainly from elderly patients, as many of 12 out of the 21 
UPEC isolates analyzed were assigned to the A and Bl phyloge- 
netic groups, which predominate among commensal E. coli. 

We found that E. coli isolates that have been assigned to the 
four phylogroups share a large general gene expression profile, 
overall 2,589 genes were commonly transcribed in all isolates dur- 
ing the in vivo conditions, which — depending on the genome size 
of the isolates — accounts for 52% to 67% of the transcribed ge- 
nome of the individual isolates. This conservation of a large part of 
the genome expression might account also for the finding that 
MALDI-TOF mass spectrometry, which probably corresponds to 
more- or less-conserved housekeeping proteins, does not allow a 
robust discrimination into the previously identified phylogenetic 
groups B2B1, A, and D. Although the 21 isolates share a large 
general gene expression profile, they do express clearly distinct 
flexible genomes. We found a strong correlation between the 
E. coli in vivo expression of the flexible genome and the genetic 
background of the isolate. However, as has been described before 
(37, 38), this correlation was dependent on the acquisition of 
group-specific gene repertoires in the flexible genomes rather than 
on a difference in their expression profile, possibly reflecting their 
evolution in distinct niches. 

Not only did our study identify previously described virulence- 
associated genes that were exclusively expressed in the 7 UPEC 
isolates clustering with group B2, but we also identified a novel set 
of genes overrepresented in those isolates. Among those, we found 
a large number of genes encoding transporter and secretion sys- 
tems, indicating that they play a role in pathogenicity of B2 group 
isolates. Furthermore, we identified a set of 142 genes whose ex- 
pression was demonstrated to be specifically enriched in the 12 
isolates that clustered with the A/Bl phylogroups, including genes 
encoding phenylalanine degradation pathway, a siderophore, fim- 
brial adhesins, and Rhs elements. 

As more examples of in vivo transcriptional profiles accumu- 
late, greater insights into the role of new genes involved in micro- 
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bial pathogenicity can be expected. However, further investiga- 
tions are required to unravel the specific impact of novel 
virulence-determining factors in the establishment and mainte- 
nance of the disease. Thereby, the application of in vivo RNA-seq 
seems to be particularly appropriate, as it affords detailed quanti- 
tative and qualitative sequence information that is independent of 
genome annotations and thus allows the establishment of fuU 
transcriptional profiles, including flexible genomic elements and 
expression of small regulatory RNAs. Furthermore, knowledge of 
SNPs as identified by the use of RNA-seq enables highly resolving 
phylogenetic grouping of clinical isolates and thus provides a basis 
for farther global phenotypic-genotypic correlation studies. 

MATERIALS AND METHODS 

Ethical statement. Urine samples were collected from 2 1 outpatients with 
symptomatic urinary tract infections and subjected to bacterial RNA ex- 
traction procedures. Samples were collected according to the standards of 
the Declaration of Helsinki. The sample provided for this research was 
subtracted from the samples collected for routine microbiological tests, 
which are made on a regular basis; therefore, no additional procedures 
were carried out on the patients. Samples were analyzed upon informed 
consent from the patients. 

Bacterial RNA extraction and lUumina-based RNA sequencing. 
Urine samples (approximately 20 ml) were mixed with RNAprotect re- 
agent (Qiagen), incubated for 15 to 30 min at room temperature, and 
centrifuged for 15 min at 4,000 X g at 4°C, and the pellet was frozen at 
— 70°C. RNA isolation was performed using the RNeasy minikit (Qiagen) 
according to the manufacturer's instruction with some modifications, 
and the DNA was removed by the use of a DNA-/ree kit (Ambion). En- 
richment for bacterial RNA was achieved by using the MicrobEnrich kit 
(Ambion) according to the manufacturer's instructions. 

Four UTI-associated isolates were also cultured in vitro in LB medium. 
RNAprotect reagent (Qiagen) was added to 3 ml of LB culture following 
growth to late exponential phase. All of the samples were treated for bac- 
terial RNA enrichment. After depletion of rRNA from the samples, total 
RNA was subjected to a commercial capture and depletion system (MI- 
CROBExpress bacterial RNA enrichment kit; Ambion), strand-specific 
bar-coded cDNA libraries were generated as described (51), and all sam- 
ples were single-end sequenced on an lUumina GenomeAnalyzer-IIx at a 
36-bp read length. Traces of human reads were removed from the raw 
sequence output by mapping all reads to the latest human genome release, 
GRCh37. Mapping was performed with Bowtie (93), allowing for maxi- 
mal 2 mismatches per read. The sequence output after human read re- 
moval consisted of 61.01 million reads. 

E. coll reference sequences. We used the genomic sequences of 54 
E. coli isolates that were available for download from GenBank/EMBL 
(September 2012) as a reference to map all lUumina reads obtained in this 
study. The 54 E. coli genomes contain 252,623 genes, which give an aver- 
age of 4,678 genes per strain (more details are presented in Table S5 in the 
supplemental material). With the aim to collapse those genes into gene 
families and define the genes present in all genomes, we first extracted all 
coding sequences (CDS) from the corresponding genomes. We then 
blasted the protein sequences found in all genomes against each other 
using BLASTP (94), discarding hits with <90% length and 50% sequence 
identity. Qnly if a gene product had a maximal reciprocal set of homologs 
in aU other strains, 54 in total, the corresponding gene was considered 
"core"; otherwise, it was considered "flexible." Flexible CDS that had ho- 
mologs in 53 or 52 of the 54 E. coli genomes were reevaluated. The set of 
core genes detected in the reciprocal Blast search comprised 1,719 CDS, 
while there were an additional 363 CDS assigned to the core genome, 
summing to 2,082 core CDS (see Data Set SI in the supplemental mate- 
rial). Apart from the 2,082 core CDS, we also identified 10,202 flexible 
CDS, including 3,257 singletons. Among the 54 completed E. coli genomes 
considered here, O26:Hll/AP010953,Olll:H/AP010960, andO103:H2/ 



AP010958 exhibited the highest number of annotated small RNAs. We 
extracted the genomic sequences of 70 noncoding RNAs (ncRNAs) from 
the Q26:H11 genome and performed BLASTN searches against each of 
the 54 genomes in order to define how many of these ncRNAs are present 
in all E. coli genomes. A total of 47 ncRNAs (45 small RNAs and 2 ncRNA, 
rne5, an RNase 5' untranslated region [UTR] element, and Alpha_RBS, a 
ribosomal binding site of alpha operon) were found in all 54 genomes, and 
41 of them were expressed at least in one of our clinical isolates. These 
ncRNAs were included in the core genome that consisted of 2,129 (2,082 
CDS and 47 ncRNAs) genes. Finally, the sum of core (2,129) and flexible 
(10,202) genes amounted to 12,331 genes. The data representing the com- 
piled 12,331-gene list, the orthologous gene IDs (with sequence length 
composition and percentage of identity) , and the gene expression levels of 
each of 21 samples are presented as Data Sets SIA and SIB in the supple- 
mental material. 

Mapping and gene expression profiling. The raw lUumina sequence 
reads (36-bp single end) were first split according to their bar codes using 
the /cistq-mc/ script of the ea-utils package (95), and then the bar code 
sequences were removed. We used the bowtie-build module in the Bowtie 
package (93) to build an indexed reference based on the 12,331 E. coli 
genes found in the 54 reference genomes as defined in the previous step. 
Mapping to the reference was performed using Bowtie with options "-m 1 
-best -strata" to allow only uniquely mapping hits and avoid uncertainties 
regarding repeat regions and ribosomal genes. Finally, the read counts per 
gene (RPG) were recorded for each annotated gene and were used as an 
input for differential gene expression calculations with the R package 
DESeq (96). Briefly, the RPG data were normalized for variation in library 
size/sequencing depth by using the estimateSizeF actor function of DESeq. 
Differentially expressed genes were identified using the nbinomTest func- 
tion based on the negative binomial model. Genes were considered to be 
differentially regulated only if their absolute logarithmic fold change over 
the control was higher than 1 at a false discovery rate of a maximum 5% 
(Benjamini and Hochberg P value correction provided in DESeq). In 
those clinical samples where no technical replica was sequenced, the un- 
corrected P values at 5% cutoff were used instead of the corrected ones. 

De novo assembly. All reads that did not map to the 12,331 E. coli 
genes were used as input for de novo transcriptome assembly with Velvet 
(97). We used a wide range of k-mers, 27 to 37, and a minimal transcript 
length of 100 bp. The assembled transcripts were blasted against all mi- 
crobial genes downloaded from the MBGD Database (98) using a mini- 
mal hit length of 100 bp and sequence similarity higher than 90%. After 
removing the ribosomal gene hits, we identified 156 additional nonredun- 
dant genes. 

Phylogenetic tree. A consensus sequence for overall 336 genes (that 
had at least 80% sequencing coverage across the 21 UTI isolates) was 
generated by the use of the mpileup option in the SAMtools package (99) . 
The corresponding orthologous gene sequences extracted from the 54 
E. coli genomes were subsequently included. The sequence redundancies 
and gaps in sequence coverage were removed, resulting in a 2.3-Mb multi- 
Fasta file used for multiple alignment with Clustal Qmega (100). The 
alignment was the subject of further refinement with RaxML (101), per- 
forming 500 bootstrapping steps and testing 50 trees. The consensus tree 
was drawn with Dendroscope (102). 

Gene ontology terms. We downloaded the current UniProt Gene Qn- 
tology (GO) knowledgebase (103). Using custom Perl scripts, we mapped 
the gene locus IDs (in KEGG format) to their UniProt identifiers and 
extracted the relevant GO IDs. The GO ID lists were summarized using 
the QuickGO browser (104). 

MALDI-TOF mass spectrometry biotyping. Intact cell smears of 19 
E. coli isolates (for two patient samples, no bacterial cultures were pre- 
served) were prepared in 10 biological replicates on MALDI target plates 
(MSP 96 polished steel target; Bruker Daltonics, Bremen, Germany) by 
following standard procedures. The air-dried smears were overlaid with 
1 p,l of saturated alpha-cyano-4-hydroxycinnamic acid matrix solution. 
E. coli DH5a bacterial test standard (Bruker Daltonics) was used for ex- 
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ternal calibration. Bacterial profile spectra were acquired in duplicates 
using a MicroflexLT MALDI-TOF device (Bruker Daltonics) for analysis 
in the mass range between 3 and 1 5,000 m/z with the Biotyper 3.1 software 
(Bruker Daltonics). In a quality-control step, spectra characterized by 
excessive noise and/or Biotyper scores indicating unreliable identification 
(<1.7) were excluded from our profile spectra library. We then generated 
reference spectra of each strain from the remaining 322 profile spectra 
using Biotyper MSP generation standard settings (105), yielding reference 
spectra for classification of our closely related E. coli strains. In a further 
quality-control step, we validated that our E. coli strains clustered together 
with the 1 1 £. coli strains among the more than 4,000 strains in the Bio- 
typer database. The 19 strain reference spectra were clustered based on 
Minkowski distances and group averages. 

Nucleotide sequence accession number. The sequencing data have 
been submitted to SRA under the project accession no. SRP029244. 
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